Collaborative Research Project
Next Class
Review
Static maps with ggmap
Dynamic results presentation
- Static website hosting with gh-pages
20 November 2014
Collaborative Research Project
Next Class
Review
Static maps with ggmap
Dynamic results presentation
Purposes: Pose an interesting research question and try to answer it using data analysis and standard academic practices. Effectively communicate your results to a variety of audiences in a variety of formats.
Deadline:
Presentation: In-class Friday 5 December
Website/Paper: 12 December
The project is a 'dry run' for your thesis with multiple presentation outputs.
Presentation: 10 minutes maximum. Engagingly present your research question and key findings to a general academic audience (fellow students).
Paper: 6,000 words maximum. Standard academic paper, properly cited laying out your research question, literature review, data, methods, and findings.
Website: An engaging website designed to convey your research to a general audience.
As always, you should submit one GitHub repository with all of the materials needed to completely reproduce your data gathering, analysis, and presentation documents.
Note: Because you've had two assignments already to work on parts of the project, I expect high quality work.
Next class is the final class before the presentations.
Mostly it is an opportunity for you to work on your project and ask me questions.
Be prepared.
What is the data-ink ratio? Why is it important for effective plotting.
What is visual weighting?
Why should you avoid using the size of circles to have meaning about continuous variables?
How many decimal places should you report in a table?
Last class we didn't have time to cover mapping with ggmap.
We've already seen how ggmap can be used to find latitude and longitude.
library(ggmap)
places <- c('Bavaria', 'Seoul', '6 Parisier Platz, Berlin',
'Hertie School of Governance')
geocode(places)
## lon lat ## 1 11.49789 48.79045 ## 2 126.97797 37.56654 ## 3 13.37854 52.51701 ## 4 13.38921 52.51286
qmap(location = 'Berlin', zoom = 15)
Example from: Kahle and Wickham (2013)
Use crime data set that comes with ggmap
names(crime)
## [1] "time" "date" "hour" "premise" "offense" "beat" ## [7] "block" "street" "type" "suffix" "number" "month" ## [13] "day" "location" "address" "lon" "lat"
# find a reasonable spatial extent
qmap('houston', zoom = 13) # gglocator(2) see in RStudio
# only violent crimes
violent_crimes <- subset(crime,
offense != "auto theft" & offense != "theft" &
offense != "burglary")
# order violent crimes
violent_crimes$offense <- factor(violent_crimes$offense,
levels = c("robbery", "aggravated assault", "rape", "murder"))
# restrict to downtown
violent_crimes <- subset(violent_crimes,
-95.39681 <= lon & lon <= -95.34188 &
29.73631 <= lat & lat <= 29.78400)
# Set up base map
HoustonMap <- qmap("houston", zoom = 14,
source = "stamen", maptype = "toner",
legend = "topleft")
# Add points
FinalMap <- HoustonMap +
geom_point(aes(x = lon, y = lat, colour = offense,
size = offense),
data = violent_crimes) +
guides(size = guide_legend(title = 'Offense'),
colour = guide_legend(title = 'Offense'))
print(FinalMap)
When your output documents are in HTML, you can create interactive visualisations.
Potentially more engaging and could let users explore data on their own.
Big distinction:
Client Side: Plots are created on the user's (client's) computer. Often JavaScript in the browser. You simply send them static HTML/JavaScript needed for their browser to create the plots.
Server Side: Data manipulations and/or plots (e.g. with Shiny Server) are done on a server in R. Browsers don't come with R built in.
There are lots of free services (e.g. GitHub Pages) for hosting webpages for client side plot rendering.
You usually have to use a paid service for server side data manipulation plotting.
You can use R to (relatively) easily create server side web applications with R.
To do this use Shiny.
We are not going to cover Shiny in the class as it does require a paid service to host.
You already know how to create HTML documents with R Markdown.
Set your code chunk to results='asis'.
There is a growing set of tools for interactive plotting:
These packages simply create an interface between R and (usually) JavaScript.
Debugging often requires some knowledge of JavaScript and the DOM.
In sum: usually simple, but can be mysteriously difficult without a good knowledge of JavaScript/HTML.
The googleVis package can create Google plots from R.
Example from googleVis Vignettes.
# Create fake data
fake_compare <- data.frame(
country = c("US", "GB", "BR"),
val1 = c(10,13,14),
val2 = c(23,12,32))
library(googleVis) line_plot <- gvisLineChart(fake_compare) print(line_plot, tag = 'chart')
Note: To show in R use plot instead of print and don't include tag = 'chart'.
library(WDI)
co2 <- WDI(indicator = 'EN.ATM.CO2E.PC', start = 2010, end = 2010)
co2 <- co2[, c('iso2c','EN.ATM.CO2E.PC')]
# Clean
names(co2) <- c('iso2c', 'CO2 Emissions per Capita')
co2[, 2] <- round(log(co2[, 2]), digits = 2)
# Plot
co2_map <- gvisGeoMap(co2, locationvar = 'iso2c',
numvar = 'CO2 Emissions per Capita',
options = list(
colors = '[0xfff7bc, 0xfec44f,
0xd95f0e]'
))
Note: That 0x replaces # for hexadecimal colors.
CO2 Emmissions (metric tons per capita)
print(co2_map, tag = 'chart')
Note: you will need to view googleVis maps that are in R Markdown documents in your browser rather than RStudio's built in HTML viewer.
More examples are available at: http://hertiedatascience2014.github.io/Examples/
Any file called index.html in a GitHub repository branch called gh-pages will be a hosted website.
The URL will be:
http://GITHUB_USER_NAME.github.io/REPO_NAME
Note: you can use a custom URL if you own one. See https://help.github.com/articles/setting-up-a-custom-domain-with-github-pages/
First create a new branch in your repository called gh-pages:
Then sync your branch with the local version of the repository.
Finally switch to the gh-pages branch.
You can use R Markdown to create the index.html page.
Simply place a new .Rmd file in the repository called index.Rmd and knit it to HTML. Then sync it.
Your website will now be live.
Every time you push to the gh-pages branch, the website will be updated.
Note branches in git repositories can have totally different files in them.
Begin to create a website for your project with static and interactive graphics.
If relevant include:
A table of key results
A googleVis map
A bar or line chart with googleVis or other package
A simulation plot created with Zelig showing key results from your regression analysis.
Push to the gh-pages branch.